Alexandra Rottenkolber, updated by Vsevolod Suschevskiy
Published
June 12, 2025
1 Introduction & getting started
Welcome to the practical part of the network analysis day! In this session, you will get an introduction to how to represent a network with the help of R (or Python, see other script), how to calculate basic network descriptives (on the micro, meso and macro level), and how to put on the “network thinking hat”. You are invited to work through the exercise sheet on your own/in groups, and with the help of mine and Carl where needed. If you are already familiar with some aspects of the lab, feel free to jump over certain sections or go straight to the exercise part.
Learning goals for this workshop
Introduction
Represent a network in R (with IGRAPH)
Create a graph from scratch (add nodes, edges, attributes)
Visualise it
Basic network descriptives
macro-level descriptives
meso-level descriptives
micro-level descriptives
Network thinking
Majority illusion
Friendship paradox
1.1 Introduction
Getting started. First, we need to install the most important packages for network analysis in R.
There are plenty of R packages for network analysis available, which are tailored to perform different types of analysis. The most important ones for the beginning are
“sna”
“igraph”
“network”
“ggraph” for visualisations (uses ggplot2)
“egor” for ego-nets (uses igraph objects)
“netseg” for assortativity patterns, homophily, segregation, within-group mixing, etc.
“isnar”
“networkdata” a large sample of networks in igraph format (to install network data, see: https://github.com/schochastics/networkdata)
“igraphdata” several network datasets
“tidygraph” for tidy manipulation of igraph objects
etc.
To install the packages you need, you can simply type the name of the package within install.packages(""), e.g. install.packages(c('sna','igraph'))install.packages("here")
1.2 Creating a graph object
The most widely used R packages for (simply) network analysis are called sna and igraph. To create a graph object, SNA works with matrices as inputs, while this is not the case for IGRAPH. IGRAPH requires igraph objects as inputs. You can find information on the basic IGRAPH datatypes here.
dim(mtx) # Shows dimensions of the matrix. Here, we have a 7x7 matrix.
[1] 7 7
# Usually, we would like to label the dimensionsdimnames(mtx) <-list(c('Kira','Amaya','Rohan', 'Robin', 'Hanna', 'Adam', 'Igor'),c('Kira','Amaya','Rohan','David', 'Hanna', 'Adam', 'Igor'))mtx
Kira Amaya Rohan David Hanna Adam Igor
Kira 0 1 0 0 0 0 0
Amaya 1 0 1 1 1 0 0
Rohan 1 1 0 0 0 0 0
Robin 0 1 0 0 1 0 0
Hanna 0 1 0 1 0 1 1
Adam 0 0 0 0 1 0 1
Igor 0 0 0 0 1 1 0
# We can access each element of a matrix separatelymtx[1,2] # first row, second column
[1] 1
mtx['Adam','Igor'] # sixth row, seventh column
[1] 1
# And we can replace elements by assigning a new valuemtx[3,1] <-0# Assigns 0 to the element in the 3rd row, 1st columnmtx
Kira Amaya Rohan David Hanna Adam Igor
Kira 0 1 0 0 0 0 0
Amaya 1 0 1 1 1 0 0
Rohan 0 1 0 0 0 0 0
Robin 0 1 0 0 1 0 0
Hanna 0 1 0 1 0 1 1
Adam 0 0 0 0 1 0 1
Igor 0 0 0 0 1 1 0
# In social network analysis, the diagonal of a matrix is often set to NA. If done so it indicates that self-loops cannot exist per definition and aren't just not present in the data.diag(mtx) # Returns the elements in positions (1,1),... (n,n)
[1] 0 0 0 0 0 0 0
diag(mtx) <-NAmtx
Kira Amaya Rohan David Hanna Adam Igor
Kira NA 1 0 0 0 0 0
Amaya 1 NA 1 1 1 0 0
Rohan 0 1 NA 0 0 0 0
Robin 0 1 0 NA 1 0 0
Hanna 0 1 0 1 NA 1 1
Adam 0 0 0 0 1 NA 1
Igor 0 0 0 0 1 1 NA
This matrix could already be interpreted as representing a network (This format is called a network’s adjacency martix.). Say the values indicate whether a person i (in the rows) is in contact with another person (in the columns) on a regular basis. How would you interpret the network?
1.2.1 Visualising a network
gplot(mtx,displaylabels =TRUE)
#plot.igraph(mtx) # this will throw an error grph <- mtx |>graph_from_adjacency_matrix(mode='undirected') # create a graph object from our martix
Warning: The `adjmatrix` argument of `graph_from_adjacency_matrix()` must be symmetric
with mode = "undirected" as of igraph 1.6.0.
ℹ Use mode = "max" to achieve the original behavior.
# Using the igraph librarygrph |>plot.igraph()
# Using the ggraph libraryset.seed(42)grph |>ggraph() +geom_edge_link() +geom_node_label(aes(label=name)) +theme_void() +ggtitle("Network visualisation with ggraph")
Using "stress" as default layout
1.2.2 Adding nodes and edges
There are several ways how to create a graph from scratch. Either you set up an adjacency matrix as shown above. Or you use an edge list as an input or add nodes and edges separately, as shown below.
# use and edge list as an input for an undirected graphg_undirected <-graph_from_literal(1-2, 2-3, 2-4, 2-5, 4-5, 5-6, 5-7, 6-7) g_undirected
# or start with an empty graph and add nodes and edges separatelyg_undirected_2 <-make_empty_graph(directed =FALSE) # empty graphg_undirected_2 <- g_undirected_2 |>add_vertices(2) # add nodesplot(g_undirected_2)
# you can also use the pipe operator and do several steps at onceg_undirected_2 <- g_undirected_2 |>add_vertices(7, color ="red") |># you can input attributes in this statement, tooadd_edges(c(2,6, 7,8, 7,9, 8,9))plot(g_undirected_2)
# For a directed graph: # use (-+, +-, or ++) to indicate the arrow ends (with a +)g_directed <-graph_from_literal(1+-2, 2-+3, 2+-4, 2++5, 4+-5, 5-+6, 5++7, 6++7) # plus sign captures the arrow endg_directed
For the next few sections, we will use Zachary’s Karate Club graph as an example network. Zachary’s Karate Club is a very famous network from the social network analysis discipline (this is the paper that made it famous, this is the paper (written by W. Zachary) where it stems from).
In his study, Zachary observed the friendship ties of members of a university’s karate club over the duration of two years. During this time period, a disagreement occurred between the administrator of the club and the club’s instructor, which led to the instructor leaving the club. He eventually founded a new club taking half of the original club’s members with him. Based on the structure of the friendship network, Zachary was able to predict almost exactly which of the two clubs people would join after the split.
This network is so famous that it can be pulled from IGRAPHDATA. You can simply call it by invoking data(karate).
2 Basic network descriptives
2.1 Macro level (global level):
Summary statistics, such as - size, and average degree, degree distribution - average clustering - transitivity
2.1.1 Size and degrees
data(karate) # data is loaded into a variable called karatekarate <- karate |> igraph::upgrade_graph() |>as_tbl_graph()#This graph was created by an old(er) igraph version.plot(karate)
# number of nodes#V(karate)vcount(karate)
[1] 34
# number of edges#E(karate)ecount(karate)
[1] 78
# degrees #igraph::degree(karate)# degree distribution hist(igraph::degree(karate))
2.1.2 Transitivity and average clustering coefficient
Often also useful descriptives at the macro level is the transitivity coefficient and the average clustering coefficient (clustering coefficients also exist at the local (node-) level). Transitivity describes how many of the existing triads are actually closed. The average clustering coefficient describes – as the name already indicates – the average of all nodes’ clustering coefficient. The node-level clustering coefficient describes the fraction of possible triangles through a node that actually exists.
adj_mtx_karate <- karate |>as_adjacency_matrix(sparse =igraph_opt("sparsematrices")) |>as.matrix() # convert igraph object to matrix (as sna package functions take matrices as input)sna::isolates(adj_mtx_karate) # Check for isolates
integer(0)
sna::gden(adj_mtx_karate) # Density (Number of actual ties over Number of potential ties)
[1] 0.1390374
# grecip(adj_mtx_karate,measure='edgewise') # For directed graphs one can check for reciprocity (Number of mutual ties over Number of existing ties)sna::gtrans(adj_mtx_karate, measure='weak') # Transitivity (how many of existing relationship are closed (triangles))
[1] 0.2556818
sna::gtrans(adj_mtx_karate, measure='strong')
[1] 0.8465909
# Dyadic and triadic configurationssna::dyad.census(adj_mtx_karate) # number of ties that are mutual, asymmetric, do not exist
A group of nodes’ characteristics live at the mesoscale, such as - community detection - homophily - assortativity
2.2.1 Community detection
Community detection is a very large field of research and has received a lot of attention in the past. The wish behind community detection basically is to identify a network’s mesoscale organisation. In simplistic terms, a community is a group of nodes which are somewhat more related to each other than to others in the network. Community detection for networks is conceptually similar to data clustering in machine learning. It is helpful if one wants to find nodes that would, for example, react similarly to an external stimulus or if you want to visualize the meso-level organisation of a network.
There are many different algorithms out there to find communities: Some use a group’s internal density, the similarity to neighbours, or the idea of random walks, … As some of the approaches out there differ in their internal logic, they might yield slightly different results.
IGRAPH has some community detection algorithms built-in, which you can find here, and which we will use in the following.
# CLUSTERING ALGORITHMS# GIRVAN-NEWMAN# Partitioning is based on edge betweeness. gn_clustering <- karate |>cluster_edge_betweenness(modularity=TRUE, membership=TRUE)
Warning in cluster_edge_betweenness(karate, modularity = TRUE, membership =
TRUE): At vendor/cigraph/src/community/edge_betweenness.c:498 : Membership
vector will be selected based on the highest modularity score.
# Communities are identified as components in the edge-pruned graph # The partitioning where the modularity is the highest is the one that gets chosen at the end.# Add a node attribute for visual color based on the original 'Faction'.# Faction is 1 or 2. We'll map this to 'gold' and 'steelblue'.V(karate)$node_visual_color <-ifelse(V(karate)$Faction ==1, 'gold', 'steelblue')# Add a unique node identifier, crucial for joining data later.# Ensure it's character type for robust joins.V(karate)$node_identifier <-as.character(1:vcount(karate))# WALKTRAP# The walk trap algorithm is based on a series of short random walks. # Random walks are hypothesized to stay within the same community.walk_clustering <-cluster_walktrap(karate, step=10)walk_clustering
[[1]]
+ 34/34 vertices, named, from 4b458a1:
[1] Mr Hi Actor 2 Actor 3 Actor 4 Actor 5 Actor 6 Actor 7 Actor 8
[9] Actor 9 Actor 10 Actor 11 Actor 12 Actor 13 Actor 14 Actor 15 Actor 16
[17] Actor 17 Actor 18 Actor 19 Actor 20 Actor 21 Actor 22 Actor 23 Actor 24
[25] Actor 25 Actor 26 Actor 27 Actor 28 Actor 29 Actor 30 Actor 31 Actor 32
[33] Actor 33 John A
[[2]]
+ 28/34 vertices, named, from 4b458a1:
[1] Mr Hi Actor 2 Actor 3 Actor 4 Actor 8 Actor 9 Actor 10 Actor 13
[9] Actor 14 Actor 15 Actor 16 Actor 18 Actor 19 Actor 20 Actor 21 Actor 22
[17] Actor 23 Actor 24 Actor 25 Actor 26 Actor 27 Actor 28 Actor 29 Actor 30
[25] Actor 31 Actor 32 Actor 33 John A
[[3]]
+ 6/34 vertices, named, from 4b458a1:
[1] Mr Hi Actor 5 Actor 6 Actor 7 Actor 11 Actor 17
[[4]]
+ 5/34 vertices, named, from 4b458a1:
[1] Mr Hi Actor 2 Actor 3 Actor 4 Actor 8
[[5]]
+ 7/34 vertices, named, from 4b458a1:
[1] Mr Hi Actor 2 Actor 3 Actor 9 Actor 31 Actor 33 John A
[[6]]
+ 5/34 vertices, named, from 4b458a1:
[1] Mr Hi Actor 5 Actor 6 Actor 7 Actor 11
[[7]]
+ 5/34 vertices, named, from 4b458a1:
[1] Mr Hi Actor 2 Actor 3 Actor 4 Actor 14
[[8]]
+ 10/34 vertices, named, from 4b458a1:
[1] Actor 3 Actor 24 Actor 25 Actor 26 Actor 28 Actor 29 Actor 30 Actor 32
[9] Actor 33 John A
# Other clustering algorithms that you could use:cluster_louvain(karate)
# Add clustering to the visualisationset.seed(42)set_layout <-create_layout(karate, layout ='fr')set_layout |>pivot_longer(cols =c(gn_group, wt_group, fg_group, lu_group, le_group), # Columns to pivotnames_to ="algorithm", # New column for algorithm namesvalues_to ="cluster_membership_id"# New column for cluster IDs for that algorithm ) ->pivoted_plotting_dataset_layout |>ggraph()+geom_edge_link(alpha =0.4, colour ='grey60') +# Slightly darker grey for better visibility# Draw nodes (common to all facets)# Node color is from 'node_visual_color' (gold/steelblue based on Faction)# This aes() is evaluated against graph_layout_with_all_attrsgeom_node_point(aes(colour = node_visual_color), size =3.5, show.legend =FALSE) +scale_colour_identity() +# Because node_visual_color directly specifies the colors# Draw community hulls using ggforce::geom_mark_hull# This layer uses 'pivoted_plotting_data', which is already faceted by 'algorithm'.# The aes() here is evaluated against pivoted_plotting_data.geom_mark_hull(data = pivoted_plotting_data,aes(x = x, y = y, group = cluster_membership_id, fill = cluster_membership_id),colour =NA, # No border for the hull polygonsalpha =0.35, # Hull transparencyconcavity =4, # Adjust concavityexpand =unit(2.5, 'mm'), # Padding around groupsshow.legend =FALSE# No legend for hull fill colors (cluster IDs) )+facet_wrap(~algorithm, ncol =3) +# Arrange in a grid, adjust ncol as neededlabs(title ="Zachary Karate Club Network: Community Detection Comparison",subtitle ="Nodes colored by original faction. Hulls show algorithm-detected communities.",caption ="Layout: Fruchterman-Reingold" ) +theme_graph(base_family ='sans', # Clean fontstrip_text_size =10, # Size for facet titles (algorithm names)plot_margin =margin(5, 5, 5, 5) # Add some margin around the plot )
What did you observe?
Probably you found that the results are quite different. Was there even a single pair of algorithms that returned exactly the same partitions?
There are different ways to continue from this finding: One option is to evaluate the partitioning based on some quality measure (e.g. the “modularity score”, or the “Normalised Mutual Information”). Another option would be to apply “consensus clustering” – an approach inspired by an observation from machine learning which can be summarised as ‘averaging several simple models often yields better accuracy than constructing the most sophisticated model possible’. The idea is very simple: Run several clustering algorithms (also the same one many times) and average the results. However, this method should not be applied blindly: Make sure the results you are averaging stem from an internally consistent ensemble of clustering algorithms (i.e. the same “family”, meaning: all algorithms apply the same logic. For example, don’t mix flow-based (infomap and co) with density optimization algorithms, etc.).
At the meso-level, we could also be interested in quantifying the extent to which homophily is driving a network’s connections. This is where the assortativity coefficient can help us. The idea behind this measure is a comparison of the number of ties that occur in-goup vs such that bridge to another group compared to the total number of ties. The assortativity coefficient, however, is a bit more advanced than this simple illustration: It is able to take different group sizes/more than two groups into account. For the assortativity coefficient, 1 means perfect assortativity (assortativity: more in-group than between group ties), -1 means perfect disassortativity (disassortativity: more between group than in-group ties).
# Assortativity (for continuous (quantitative) attributes)assortativity_degree(karate,directed=FALSE) # degree assortativity (Finding: high degree nodes tend to connect to low degree nodes)
[1] -0.4756131
# Assortativity for qualitative attributesassortativity(karate, values =V(karate)$color, directed=FALSE)
[1] 0.7434211
Here we find the Karate network to be assorted by ‘support’ meaning that there is not to much mixing among support groups.
2.3 Micro level (local level):
Node level characteristics such as - centrality measures (degree centrality, betweenness centrality, closeness centrality, pagerank) - nodes’ degrees
On the micro-level, we are usually interested in the question: Which are the important nodes? In which way are they important?
To answer this question, we could start looking at the degree of a node, assuming that the ones with the most links are most likely the most important ones. But what if we want to measure another “type of importance”, e.g. if a node has a bridging functionality (critical for example, for information flow, exchange, exposure, etc.)? Such a node probably has a very small degree, but if it were gone, the topology of the network would be substantially different.
To capture these different qualities, we have different measures at hand (usually summarised as centrality measures). The most important ones are closeness centrality, betweenness centrality, and PageRank, and the local clustering coefficients. IGRAPH and SNA have the most important algorithms readily implemented.
# Centrality measuresdegree(karate) # degree
Mr Hi Actor 2 Actor 3 Actor 4 Actor 5 Actor 6 Actor 7 Actor 8
16 9 10 6 3 4 4 4
Actor 9 Actor 10 Actor 11 Actor 12 Actor 13 Actor 14 Actor 15 Actor 16
5 2 3 1 2 5 2 2
Actor 17 Actor 18 Actor 19 Actor 20 Actor 21 Actor 22 Actor 23 Actor 24
2 2 2 3 2 2 2 5
Actor 25 Actor 26 Actor 27 Actor 28 Actor 29 Actor 30 Actor 31 Actor 32
3 3 2 4 3 4 4 6
Actor 33 John A
12 17
betweenness(karate) # betweenness centrality
Mr Hi Actor 2 Actor 3 Actor 4 Actor 5 Actor 6 Actor 7
250.150000 33.800000 36.650000 1.333333 0.500000 15.500000 15.500000
Actor 8 Actor 9 Actor 10 Actor 11 Actor 12 Actor 13 Actor 14
0.000000 13.100000 7.283333 0.500000 0.000000 0.000000 1.200000
Actor 15 Actor 16 Actor 17 Actor 18 Actor 19 Actor 20 Actor 21
0.000000 0.000000 0.000000 16.100000 3.000000 127.066667 0.000000
Actor 22 Actor 23 Actor 24 Actor 25 Actor 26 Actor 27 Actor 28
0.000000 0.000000 1.000000 33.833333 0.500000 0.000000 6.500000
Actor 29 Actor 30 Actor 31 Actor 32 Actor 33 John A
10.100000 0.000000 3.000000 66.333333 38.133333 209.500000
#closeness(karate) # closeness centrality#page_rank(karate)$vector # pagerank#transitivity(karate, type = "local") # local clustering coefficient# Assign centrality score as attribute, example here is betweenness centralitykarate |>activate(nodes) |>mutate(degree =centrality_degree(),betweenness =centrality_betweenness(),closeness =centrality_closeness(),pagerank =centrality_pagerank()) -> karate# Visualisationhist(V(karate)$betweenness |>as.numeric())
This section aims at stimulating your network thinking. You will discover two network peculiarities that might appear counterintuitive at first glance, but make a lot of sense once you take the networks’ topologies into consideration.
3.1 Majority illusion
In networks where we observe strong homophily, it might happen that we observe something like the “illusion of a majority”: Even if the majority does not hold a certain characteristic, we can draw a network in which most people believe the opposite is true.
Let’s look at an easy example. We can use our simple friendship network from above, which is in the variable g_undirected. In this network, nodes represent friends who are assigned one attribute (their gender).
To test for the majority illusion, we need to compare the average of the gender shares in all ego networks to the global gender share. Hence, we first have to iterate over all ego networks (one network per friend), remove the ego node for each ego network, and calculate the share of female friends a person “sees” in their ego network. Then, we need to average the perceived gender share among all nodes. In a second step, we need to calculate the gender shares in the friendship network from a global perspective.
Let’s extract an ego network for one person, for example, Amaya
# calculation of gender shares in terms of local averagesavg_f_gender =0avg_d_gender =0avg_m_gender =0genders <-V(g_undirected)$gendernames <-V(g_undirected)$namemtx_friends <- g_undirected |>as_adjacency_matrix(sparse =igraph_opt("sparsematrices")) |>as.matrix()ego_networks <- sna::ego.extract(mtx_friends, ego =NULL, neighborhood =c("combined"))for (ego_node inV(g_undirected)$name) { idx <-which(names == ego_node) friends_of_ego <-colnames(ego_networks[[idx]])[colnames(ego_networks[[idx]]) != ego_node]# get gender shares in ego network proportions =prop.table(table(V(g_undirected)$gender[which(V(g_undirected)$name %in% friends_of_ego)])) if (is.na(as.table(proportions)["d"]) ==FALSE) { avg_d_gender <- avg_d_gender +as.table(proportions)[["d"]] }if (is.na(as.table(proportions)["f"]) ==FALSE) { avg_f_gender <- avg_f_gender +as.table(proportions)[["f"]] }if (is.na(as.table(proportions)["m"]) ==FALSE) { avg_m_gender <- avg_m_gender +as.table(proportions)[["m"]] }}# Extract resultscat(paste("The average node thinks that", round(100* avg_f_gender /length(names)), "% of the network is female,",round(100* avg_d_gender /length(names)), "% diverse, and",round(100* avg_m_gender /length(names)), "% male."))
The average node thinks that 68 % of the network is female, 7 % diverse, and 25 % male.
# calculation of the gender shares from a global perspectiveglobal_absolut_f =0global_absolut_d =0global_absolut_m =0for (node inV(g_undirected)$name) { # Iterate over all nodesif (V(g_undirected)[node]$gender =="f") { # Count the number of female friends global_absolut_f <- global_absolut_f +1 }if (V(g_undirected)[node]$gender =="d") { # Count the number of divers friends global_absolut_d <- global_absolut_d +1 }if (V(g_undirected)[node]$gender =="m") { # Count the number of male friends global_absolut_m <- global_absolut_m +1 } }# Extract resultscat(paste("From a global perspective,", round(100* global_absolut_f /length(names)), "% of the network is female,",round(100* global_absolut_d /length(names)), "% diverse, and",round(100* global_absolut_m /length(names)), "% male."))
From a global perspective, 43 % of the network is female, 14 % diverse, and 43 % male.
3.1.1 Exercise 1:
What do you observe? Why do you see what you see?
YOUR COMMENTS HERE
3.1.2 Exercise 2 (Extra): Reflections on the majority illusion
Which implications does this phenomenon, in your opinion, have for social cohesion? Or the “public opinion”? Political influence? …?
The majority illusion is only one peculiarity. Another stunning case, driven by homophily, that I want to bring to your attention is that even mild preferences often result in very strict homophilic patterns (along the lines of “the aggregate is more than the sum (of individual-level preferences)”). One of the most famous examples to illustrate this segregation (see work of Nobel Prize-winning game theorist Thomas Schelling, and especially his paper Dynamic models of segregation from 1971). You can play an interactive version of the paper content here: Parable of Polygons. Even if people are happy being in the minority, the group ends up being segregated. Comment: There is no network in this interactive post, but one could easily transfer this to a friendship network where people are allowed to rewire their connections.
YOUR COMMENTS HERE
3.1.3 Exercise 3: Blog posts
For this exercise, we will use some (a bit more exiting) real-world data. Precisely, the dataset from this article. The dataset contains front-page hyperlinks (edges) between blogs (nodes) in the context of the 2004 US election (directed network). The dataset is available here.
Load the data (edges stored in edges_blogs.txt) and assign the labels (stored in node_labels_blogs.txt) to every node.
Repeat the steps as in Exercise 1. Do you find a majority illusion here, too?
node_names <-V(g_blogs)$nameavg_right_leaning_tidy <- node_names |> purrr::map_dbl(function(current_ego_name) {# 1. Get the names of the alters (neighbors) for the current_ego_name alter_node_names <-names(neighbors(g_blogs, current_ego_name))# If the ego node has no alters, its contribution to the sum is 0if (length(alter_node_names) ==0) {return(0.0) }# 2. Filter 'df2_nodes_attributes' for these alters and get their attributes.# We also filter out NA attributes to match the default behavior of `table()`. alters_attributes_df <- df2_nodes_attributes |>filter(as.character(node) %in% alter_node_names) |>filter(!is.na(attribute)) # Exclude rows where the attribute itself is NA# If no alters are found in the attributes dataframe or all their attributes are NA, contribution is 0if (nrow(alters_attributes_df) ==0) {return(0.0) }# 3. Calculate the proportions of each attribute among the alters proportions_summary <- alters_attributes_df |>count(attribute, name ="n_val") |># Count occurrences of each attributemutate(proportion = n_val /sum(n_val)) # Calculate proportion for each attribute# 4. Extract the proportion for the "right-leaning" attribute right_leaning_prop_value <- proportions_summary |>filter(attribute =="right-leaning") |>pull(proportion) # Extract the proportion value# If "right-leaning" attribute was not found among alters, its proportion is 0if (length(right_leaning_prop_value) ==0) {return(0.0) } else {# pull() might return multiple values if 'attribute' wasn't unique after count (not the case here)# We take the first (and should be only) value.return(right_leaning_prop_value[1]) } }) |>sum(na.rm =TRUE)# Extract resultscat(paste("The average node thinks that", round(100* avg_right_leaning_tidy /vcount(g_blogs), 2), "% of the network is right-leaning."))
The average node thinks that 53.61 % of the network is right-leaning.
# global perspective prop.table(table(V(g_blogs)$leaning))
left-leaning right-leaning
0.4803922 0.5196078
cat(paste("From a global perspective,", round(prop.table(table(V(g_blogs)$leaning))[["right-leaning"]]*100, 2), "% of the nodes are right-leaning."))
From a global perspective, 51.96 % of the nodes are right-leaning.
Even though both political blog communities are divided, the majority illusion does not seem to be present in this case. Both the averaged ego perspective and the global network perspective lead to similar results with respect to the share of nodes that are right-leaning.
However, the high attribute assortative coefficient indicates that blogs tend to reference those like them with regard to political orientation.
3.2 The friendship paradox
3.2.1 Exercise 4:
It’s time for the next peculiarity of network effects: The friendship paradox.
For this exercise, we will use the data from the following paper: Viswanath, B., Mislove, A., Cha, M., & Gummadi, K. P. (2009). On the evolution of user interaction in Facebook. Proceedings of the 2nd ACM Workshop on Online Social Networks, 37–42.
You can find it in the data folder under the name data-facebook.txt.
Read in the data, and construct your graph.
This time, we are not interested in the assortativity by age, but rather in assortativity by number of friends (i.e. the degree assortativity). Generate a plot that shows the average number of the friends of a person’s friends against the number of this person’s friends. Add the identity line (line that goes through (0,0) and (1,1)).
Persons above the identity line have fewer friends than their average neighbour, while nodes below the identity line have more. The friendship paradox states that most nodes have fewer friends than their friends’ average. Can you check whether the friendship paradox is also at play in the Facebook network? (Hint: One needs to count the number of nodes above and below the identity line and to compare the size of both groups.)
# Get the degrees of the nodesg_facebook |>activate(nodes) |>mutate(degree =centrality_degree()) -> g_facebookneighbors_df <- g_facebook |>activate(edges) |>mutate(ego_degree =.N()$degree[from],alter_degree =.N()$degree[to])neighbors_df |>as_tibble() |>rename(ego = from, alters = to, ego_degree = ego_degree,alter_degree = alter_degree) -> merged_df# calculate how many friends and ego's friends have on averageavg_friends_df <- merged_df |>group_by(ego) |>summarise(ego_degree =min(ego_degree), avg_degree_neighbor =mean(alter_degree)) |>na.omit()# Plot the dataavg_friends_df |>slice_sample(n =2000) |>ggplot(aes(x = ego_degree, y = avg_degree_neighbor)) +geom_abline(slope =1, intercept =0, color ="darkgreen", linetype ="dashed") +stat_density_2d(aes(fill =after_stat(level)), geom ="polygon", alpha =1)+geom_point(fill =NA, size =3, alpha =0.1, color ="black", shape =21, stroke =1) +scale_x_log10(limits =c(1, 1000)) +scale_y_log10(limits =c(1, 250)) +labs(x ="Number of friends", y ="Number of friends of the average neighbour",title ="Number of friendships",caption ="Sample of 2000") +theme_minimal()+scale_fill_continuous(limits =c(0,1), breaks =c(0, 0.25, 0.5, 0.75, 1),type ="viridis", name ="Density",guide =guide_colourbar(direction ="horizontal") )+theme(legend.position =c(0.7, 0.2),legend.title.position ="top",legend.ticks =element_blank())
Warning: A numeric `legend.position` argument in `theme()` was deprecated in ggplot2
3.5.0.
ℹ Please use the `legend.position.inside` argument of `theme()` instead.
# Count the number of nodes above and below the identity lineparadox_df <- merged_df |>group_by(ego) |>summarise(avg_degree_neighbor =mean(alter_degree), ego_degree =mean(ego_degree))paradox_df <- paradox_df |>mutate(ego_more_than_alters =if_else(ego_degree > avg_degree_neighbor, "yes", "no"))proportions <-prop.table(table(paradox_df$ego_more_than_alters))# Check for the friendship paradoxif (proportions["no"] > proportions["yes"]) {print("We found a friendship paradox.")} else {print("We did not observe a friendship paradox.")}
[1] "We found a friendship paradox."
The friendship paradox is the observation that the degrees of the neighbours of a node in a network are, on average, greater than the degree of the node itself. In other words, your friends have more friends than you do. We see evidence for this in the data: if there was no degree assortativity in the network, points in the plot would scatter around the identity line. Instead, most points in the low and intermediate-range (people having one to around 80 friends) tend to be connected to individuals who have more friends. The most popular individuals in the network (100 friends and more) tend to be connected to individuals who have fewer friends, in other words, they have many ‘followers’ who are not so well connected.